H2OVL-Mississippi-2B is a high-performance general-purpose vision-language model developed by H2O.ai, capable of handling a wide range of multimodal tasks. This model has 2 billion parameters and performs excellently in tasks such as image captioning, visual question answering (VQA), and document understanding.
Image-to-Text
Transformers English